Skip to content

Conversation

@winner245
Copy link
Contributor

@winner245 winner245 commented Oct 28, 2024

This PR optimizes the input iterator overload of assign(_InputIterator, _InputIterator) in std::vector<_Tp, _Allocator> by directly assigning to already initialized memory, rather than first destroying existing elements and then constructing new ones. By eliminating unnecessary destruction and construction, the proposed algorithm enhances the performance by up to 2x for trivial element types (e.g., std::vector<int>), up to 2.6x for non-trivial element types like std::vector<std::string>, and up to 3.4x for more complex non-trivial types (e.g., std::vector<std::vector<int>>).

Google Benchmarks

Benchmark tests (libcxx/test/benchmarks/vector_operations.bench.cpp) were conducted for the assign() implementations before and after this patch. The tests focused on trivial element types like std::vector<int>, and non-trivial element types such as std::vector<std::string> and std::vector<std::vector<int>>.

Before

-------------------------------------------------------------------------------------------------
Benchmark                                                       Time             CPU   Iterations
-------------------------------------------------------------------------------------------------
BM_AssignInputIterIter/vector_int/1024/1024                  1157 ns         1169 ns       608188
BM_AssignInputIterIter<32>/vector_string/1024/1024          14559 ns        14710 ns        47277
BM_AssignInputIterIter<32>/vector_vector_int/1024/1024      26846 ns        27129 ns        25925

After

-------------------------------------------------------------------------------------------------
Benchmark                                                       Time             CPU   Iterations
-------------------------------------------------------------------------------------------------
BM_AssignInputIterIter/vector_int/1024/1024                   561 ns          566 ns      1242251
BM_AssignInputIterIter<32>/vector_string/1024/1024           5604 ns         5664 ns       128365
BM_AssignInputIterIter<32>/vector_vector_int/1024/1024       7927 ns         8012 ns        88579

@winner245 winner245 requested a review from a team as a code owner October 28, 2024 02:16
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Oct 28, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 28, 2024

@llvm/pr-subscribers-libcxx

Author: Peng Liu (winner245)

Changes

Summary

This PR optimizes the vector::__assign_with_sentinel function by reusing existing memory more effectively, resulting in improved performance.

Details

  • Memory Reuse: The new implementation reuses the memory by directly assigning to the already initialized memory, instead of destructing all existing elements. Destruction now only occurs when the original vector has more elements than the input iterator range. By avoiding unnecessary destruction of existing elements, the new implementation potentially avoids memory deallocation for element types that maintain resources, allowing for memory reuse. This reduction in overhead leads to performance improvements. The new implementation is particularly beneficial for pre-populated vectors, resulting in 2.1x performance gains.

Testing

Benchmark tests (Quick-Bench Results) show significant performance improvements for test cases with pre-populated elements where the vector sizes are about the same before and after assignment:

  • 1000 -> 1000: ~2.1x faster
  • 1000 -> 1: roughly the same
  • 1 -> 1000: roughly the same

where m -&gt; n represent the size change from m to n due to assignment.


Full diff: https://github.com/llvm/llvm-project/pull/113852.diff

1 Files Affected:

  • (modified) libcxx/include/__vector/vector.h (+8-3)
diff --git a/libcxx/include/__vector/vector.h b/libcxx/include/__vector/vector.h
index 7889e8c2201ac1..6c37c3113a536a 100644
--- a/libcxx/include/__vector/vector.h
+++ b/libcxx/include/__vector/vector.h
@@ -1031,9 +1031,14 @@ template <class _Tp, class _Allocator>
 template <class _Iterator, class _Sentinel>
 _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void
 vector<_Tp, _Allocator>::__assign_with_sentinel(_Iterator __first, _Sentinel __last) {
-  clear();
-  for (; __first != __last; ++__first)
-    emplace_back(*__first);
+  pointer __cur = __begin_;
+  for (; __first != __last && __cur != __end_; ++__cur, ++__first)
+    *__cur = *__first;
+  if (__cur != __end_)
+    __destruct_at_end(__cur);
+  else
+    for (; __first != __last; ++__first)
+      emplace_back(*__first);
 }
 
 template <class _Tp, class _Allocator>

@winner245 winner245 force-pushed the winner245/vec_assign_with_sentinel branch from 9f5bbf5 to 83d79b3 Compare November 7, 2024 20:24
Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments. This is great!

c1 = c2;
DoNotOptimizeData(c1);
DoNotOptimizeData(c2);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not attached to this line: Can you please add a release note to 20.rst mentioning this optimization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your positive feedback and recognition of my work! I appreciate your time and effort in reviewing this PR. I have added a description of this performance optimization to the release notes and rebased the PR onto the main branch. Thanks again for your help and support!

@winner245 winner245 force-pushed the winner245/vec_assign_with_sentinel branch from 83d79b3 to 03b8721 Compare November 11, 2024 19:46
Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

It would be great if @philnik777 or @frederick-vs-ja could also have a look to make sure I didn't miss something related to conformance.

Copy link
Contributor

@philnik777 philnik777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation itself LGTM, but I think we want to rework the benchmarks. I'd also like to see the actual benchmark results and would rather have some text in the commit message than graphics, since I don't think the graphics will show anywhere except on GitHub.

@winner245 winner245 force-pushed the winner245/vec_assign_with_sentinel branch from c5b2e3f to 87f94b9 Compare November 12, 2024 16:03
@github-actions
Copy link

github-actions bot commented Nov 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@winner245 winner245 changed the title Optimize __assign_with_sentinel in std::vector Optimize input iterator overload of std::vector::assign(first, last) Nov 14, 2024
@winner245 winner245 force-pushed the winner245/vec_assign_with_sentinel branch from 549ba00 to 22e78f4 Compare November 14, 2024 03:43
@winner245 winner245 force-pushed the winner245/vec_assign_with_sentinel branch 2 times, most recently from a2e4ff3 to 6112450 Compare November 17, 2024 20:51
Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes still LGTM, I have a few comments on the benchmarks and most importantly I'd like @philnik777 to chime in to say whether he's satisfied with the benchmarks, since he had requested some changes.

}

template <class IntT>
inline std::vector<std::vector<IntT>> getRandomIntegerInputsWithLength(std::size_t N, std::size_t len) { // N-by-len
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline std::vector<std::vector<IntT>> getRandomIntegerInputsWithLength(std::size_t N, std::size_t len) { // N-by-len
std::vector<std::vector<IntT>> getRandomIntegerInputsWithLength(std::size_t N, std::size_t len) { // N-by-len

inline not needed since this is a template.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@winner245 winner245 changed the title Optimize input iterator overload of std::vector::assign(first, last) Optimize std::vector::assign for InputIterator-pair inputs Nov 28, 2024
@winner245 winner245 changed the title Optimize std::vector::assign for InputIterator-pair inputs Optimize vector::assign for InputIterator-only pair inputs Nov 28, 2024
@winner245 winner245 force-pushed the winner245/vec_assign_with_sentinel branch from 6112450 to 4fa53ef Compare November 28, 2024 16:16
Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but please wait for @philnik777 to stamp this since he had comments.

@winner245
Copy link
Contributor Author

@philnik777 Thank you for the approval! I appreciate your time.

@philnik777 philnik777 merged commit 056153f into llvm:main Nov 28, 2024
62 checks passed
@winner245 winner245 deleted the winner245/vec_assign_with_sentinel branch November 28, 2024 21:30
ldionne pushed a commit that referenced this pull request Jan 14, 2025
As a follow-up to #113852, this PR optimizes the performance of the
`insert(const_iterator pos, InputIt first, InputIt last)` function for
`input_iterator`-pair inputs in `std::vector` for cases where
reallocation occurs during insertion. Additionally, this optimization
enhances exception safety by replacing the traditional `try-catch`
mechanism with a modern exception guard for the `insert` function.

The optimization targets cases where insertion trigger reallocation. In
scenarios without reallocation, the implementation remains unchanged.

Previous implementation
-----------------------
The previous implementation of `insert` is inefficient in reallocation
scenarios because it performs the following steps separately:
- `reserve()`: This leads to the first round of relocating old
elements to new memory;
- `rotate()`: This leads to the second round of reorganizing the
existing elements;
- Move-forward: Moves the elements after the insertion position to
their final positions.
- Insert: performs the actual insertion.

This approach results in a lot of redundant operations, requiring the
elements to undergo three rounds of relocations/reorganizations to be
placed in their final positions.

Proposed implementation
-----------------------
The proposed implementation jointly optimize the above 4 steps in the
previous implementation such that each element is placed in its final
position in just one round of relocation. Specifically, this
optimization reduces the total cost from 2 relocations + 1 std::rotate
call to just 1 relocation, without needing to call `std::rotate`,
thereby significantly improving overall performance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants